Velvet is a set of algorithms manipulating de Bruijn graphs for genomic and de novo transcriptomic Sequence assembly[1][2][3]. It was designed for short read sequencing technologies, such as Solexa or 454 Sequencing and was developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute. The tool takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs. It has also been implemented inside the commercial package Geneious Server.
Contents |
For each k-mer observed (and its reverse complement) in the set of reads, the hash table records the ID of the first read encountered containing that k-mer and the position of its occurrence within that read. A second database is created with the opposite information:short read -> original k-mers are overlapped by subsequent reads.
Whenever a node A has only one outgoing arc that points to another node B that has only one ingoing arc, the two nodes are merged.
Errors can be due to both the sequencing process or to the polymorphisms.